Conversation
…ck tests - Add `model: "haiku"` and `max_turns: 5` config for all sub-agents to minimize cost/latency and prevent indefinite hangs - Move infinite block tests (prompt and command) to dedicated serial step with both should-fire and should-not-fire scenarios - Extract reset instructions to reusable reset.md step that other steps reference internally - Reduce parallel tests from 8 to 6, serial tests from 8 to 6 - Bump version to 1.3.0
Auto-generated by `deepwork install`: - Added skills for new infinite_block_tests and reset steps - Updated existing step skills with new configuration
- Reset step now runs as a dependency before run_not_fire_tests to ensure clean environment before any tests begin - "Should NOT fire" tests now verify the rules queue is empty after sub-agents complete, confirming rules truly didn't fire - Update job description to reflect 4-step flow with reset first - Bump version to 1.4.0
- Update Tests 3 & 4 to specify dual criteria: should fire AND should return in reasonable time (via max_turns limit) - Add "Returned in Time?" column to results tracking table - Note that Task tool has no direct timeout, so max_turns is the safeguard against infinite hanging - Update quality criteria to separately verify "should NOT fire" and "should fire" test behaviors
- Simplify reset step to single criterion (environment clean) - Update infinite_block_tests to include "returned in reasonable time" criterion for no-promise tests - Keep verbose criteria for run_not_fire_tests and run_fire_tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR refactors the manual_tests job to improve maintainability and test organization by extracting common reset logic into a reusable step and moving infinite block tests into a dedicated serial step.
Key Changes
Version bump: Updated from 1.2.1 to 1.3.0
New reset step: Created
steps/reset.mdcontaining centralized reset instructions that other steps can call internallyNew infinite block tests step: Created
steps/infinite_block_tests.mdas a dedicated step for infinite block testingUpdated run_not_fire_tests.md:
model: "haiku"andmax_turns: 5Updated run_fire_tests.md:
model: "haiku"andmax_turns: 5Updated job.yml:
Implementation Details
model: "haiku",max_turns: 5) is now explicitly documented in all test steps